9 research outputs found
An intelligent modular real-time vision-based system for environment perception
A significant portion of driving hazards is caused by human error and
disregard for local driving regulations; Consequently, an intelligent
assistance system can be beneficial. This paper proposes a novel vision-based
modular package to ensure drivers' safety by perceiving the environment. Each
module is designed based on accuracy and inference time to deliver real-time
performance. As a result, the proposed system can be implemented on a wide
range of vehicles with minimum hardware requirements. Our modular package
comprises four main sections: lane detection, object detection, segmentation,
and monocular depth estimation. Each section is accompanied by novel techniques
to improve the accuracy of others along with the entire system. Furthermore, a
GUI is developed to display perceived information to the driver. In addition to
using public datasets, like BDD100K, we have also collected and annotated a
local dataset that we utilize to fine-tune and evaluate our system. We show
that the accuracy of our system is above 80% in all the sections. Our code and
data are available at
https://github.com/Pandas-Team/Autonomous-Vehicle-Environment-PerceptionComment: Accepted in NeurIPS 2022 Workshop on Machine Learning for Autonomous
Drivin
INCODE: Implicit Neural Conditioning with Prior Knowledge Embeddings
Implicit Neural Representations (INRs) have revolutionized signal
representation by leveraging neural networks to provide continuous and smooth
representations of complex data. However, existing INRs face limitations in
capturing fine-grained details, handling noise, and adapting to diverse signal
types. To address these challenges, we introduce INCODE, a novel approach that
enhances the control of the sinusoidal-based activation function in INRs using
deep prior knowledge. INCODE comprises a harmonizer network and a composer
network, where the harmonizer network dynamically adjusts key parameters of the
activation function. Through a task-specific pre-trained model, INCODE adapts
the task-specific parameters to optimize the representation process. Our
approach not only excels in representation, but also extends its prowess to
tackle complex tasks such as audio, image, and 3D shape reconstructions, as
well as intricate challenges such as neural radiance fields (NeRFs), and
inverse problems, including denoising, super-resolution, inpainting, and CT
reconstruction. Through comprehensive experiments, INCODE demonstrates its
superiority in terms of robustness, accuracy, quality, and convergence rate,
broadening the scope of signal representation. Please visit the project's
website for details on the proposed method and access to the code.Comment: Accepted at WACV 2024 conferenc
Self-supervised Semantic Segmentation: Consistency over Transformation
Accurate medical image segmentation is of utmost importance for enabling
automated clinical decision procedures. However, prevailing supervised deep
learning approaches for medical image segmentation encounter significant
challenges due to their heavy dependence on extensive labeled training data. To
tackle this issue, we propose a novel self-supervised algorithm,
\textbf{S-Net}, which integrates a robust framework based on the proposed
Inception Large Kernel Attention (I-LKA) modules. This architectural
enhancement makes it possible to comprehensively capture contextual information
while preserving local intricacies, thereby enabling precise semantic
segmentation. Furthermore, considering that lesions in medical images often
exhibit deformations, we leverage deformable convolution as an integral
component to effectively capture and delineate lesion deformations for superior
object boundary definition. Additionally, our self-supervised strategy
emphasizes the acquisition of invariance to affine transformations, which is
commonly encountered in medical scenarios. This emphasis on robustness with
respect to geometric distortions significantly enhances the model's ability to
accurately model and handle such distortions. To enforce spatial consistency
and promote the grouping of spatially connected image pixels with similar
feature representations, we introduce a spatial consistency loss term. This
aids the network in effectively capturing the relationships among neighboring
pixels and enhancing the overall segmentation quality. The S-Net approach
iteratively learns pixel-level feature representations for image content
clustering in an end-to-end manner. Our experimental results on skin lesion and
lung organ segmentation tasks show the superior performance of our method
compared to the SOTA approaches. https://github.com/mindflow-institue/SSCTComment: Accepted in ICCV 2023 workshop CVAM
Foundational Models in Medical Imaging: A Comprehensive Survey and Future Vision
Foundation models, large-scale, pre-trained deep-learning models adapted to a
wide range of downstream tasks have gained significant interest lately in
various deep-learning problems undergoing a paradigm shift with the rise of
these models. Trained on large-scale dataset to bridge the gap between
different modalities, foundation models facilitate contextual reasoning,
generalization, and prompt capabilities at test time. The predictions of these
models can be adjusted for new tasks by augmenting the model input with
task-specific hints called prompts without requiring extensive labeled data and
retraining. Capitalizing on the advances in computer vision, medical imaging
has also marked a growing interest in these models. To assist researchers in
navigating this direction, this survey intends to provide a comprehensive
overview of foundation models in the domain of medical imaging. Specifically,
we initiate our exploration by providing an exposition of the fundamental
concepts forming the basis of foundation models. Subsequently, we offer a
methodical taxonomy of foundation models within the medical domain, proposing a
classification system primarily structured around training strategies, while
also incorporating additional facets such as application domains, imaging
modalities, specific organs of interest, and the algorithms integral to these
models. Furthermore, we emphasize the practical use case of some selected
approaches and then discuss the opportunities, applications, and future
directions of these large-scale pre-trained models, for analyzing medical
images. In the same vein, we address the prevailing challenges and research
pathways associated with foundational models in medical imaging. These
encompass the areas of interpretability, data management, computational
requirements, and the nuanced issue of contextual comprehension.Comment: The paper is currently in the process of being prepared for
submission to MI
Diffusion Models for Medical Image Analysis: A Comprehensive Survey
Denoising diffusion models, a class of generative models, have garnered
immense interest lately in various deep-learning problems. A diffusion
probabilistic model defines a forward diffusion stage where the input data is
gradually perturbed over several steps by adding Gaussian noise and then learns
to reverse the diffusion process to retrieve the desired noise-free data from
noisy data samples. Diffusion models are widely appreciated for their strong
mode coverage and quality of the generated samples despite their known
computational burdens. Capitalizing on the advances in computer vision, the
field of medical imaging has also observed a growing interest in diffusion
models. To help the researcher navigate this profusion, this survey intends to
provide a comprehensive overview of diffusion models in the discipline of
medical image analysis. Specifically, we introduce the solid theoretical
foundation and fundamental concepts behind diffusion models and the three
generic diffusion modelling frameworks: diffusion probabilistic models,
noise-conditioned score networks, and stochastic differential equations. Then,
we provide a systematic taxonomy of diffusion models in the medical domain and
propose a multi-perspective categorization based on their application, imaging
modality, organ of interest, and algorithms. To this end, we cover extensive
applications of diffusion models in the medical domain. Furthermore, we
emphasize the practical use case of some selected approaches, and then we
discuss the limitations of the diffusion models in the medical domain and
propose several directions to fulfill the demands of this field. Finally, we
gather the overviewed studies with their available open-source implementations
at
https://github.com/amirhossein-kz/Awesome-Diffusion-Models-in-Medical-Imaging.Comment: Second revision: including more papers and further discussion
Unlocking Fine-Grained Details with Wavelet-based High-Frequency Enhancement in Transformers
Medical image segmentation is a critical task that plays a vital role in
diagnosis, treatment planning, and disease monitoring. Accurate segmentation of
anatomical structures and abnormalities from medical images can aid in the
early detection and treatment of various diseases. In this paper, we address
the local feature deficiency of the Transformer model by carefully re-designing
the self-attention map to produce accurate dense prediction in medical images.
To this end, we first apply the wavelet transformation to decompose the input
feature map into low-frequency (LF) and high-frequency (HF) subbands. The LF
segment is associated with coarse-grained features while the HF components
preserve fine-grained features such as texture and edge information. Next, we
reformulate the self-attention operation using the efficient Transformer to
perform both spatial and context attention on top of the frequency
representation. Furthermore, to intensify the importance of the boundary
information, we impose an additional attention map by creating a Gaussian
pyramid on top of the HF components. Moreover, we propose a multi-scale context
enhancement block within skip connections to adaptively model inter-scale
dependencies to overcome the semantic gap among stages of the encoder and
decoder modules. Throughout comprehensive experiments, we demonstrate the
effectiveness of our strategy on multi-organ and skin lesion segmentation
benchmarks. The implementation code will be available upon acceptance.
\href{https://github.com/mindflow-institue/WaveFormer}{GitHub}.Comment: Accepted in MICCAI 2023 workshop MLM
Laplacian-Former: Overcoming the Limitations of Vision Transformers in Local Texture Detection
Vision Transformer (ViT) models have demonstrated a breakthrough in a wide
range of computer vision tasks. However, compared to the Convolutional Neural
Network (CNN) models, it has been observed that the ViT models struggle to
capture high-frequency components of images, which can limit their ability to
detect local textures and edge information. As abnormalities in human tissue,
such as tumors and lesions, may greatly vary in structure, texture, and shape,
high-frequency information such as texture is crucial for effective semantic
segmentation tasks. To address this limitation in ViT models, we propose a new
technique, Laplacian-Former, that enhances the self-attention map by adaptively
re-calibrating the frequency information in a Laplacian pyramid. More
specifically, our proposed method utilizes a dual attention mechanism via
efficient attention and frequency attention while the efficient attention
mechanism reduces the complexity of self-attention to linear while producing
the same output, selectively intensifying the contribution of shape and texture
features. Furthermore, we introduce a novel efficient enhancement multi-scale
bridge that effectively transfers spatial information from the encoder to the
decoder while preserving the fundamental features. We demonstrate the efficacy
of Laplacian-former on multi-organ and skin lesion segmentation tasks with
+1.87\% and +0.76\% dice scores compared to SOTA approaches, respectively. Our
implementation is publically available at
https://github.com/mindflow-institue/Laplacian-FormerComment: Accepted in the main conference MICCAI 202
Beyond Self-Attention: Deformable Large Kernel Attention for Medical Image Segmentation
Medical image segmentation has seen significant improvements with transformer
models, which excel in grasping far-reaching contexts and global contextual
information. However, the increasing computational demands of these models,
proportional to the squared token count, limit their depth and resolution
capabilities. Most current methods process D volumetric image data
slice-by-slice (called pseudo 3D), missing crucial inter-slice information and
thus reducing the model's overall performance. To address these challenges, we
introduce the concept of \textbf{Deformable Large Kernel Attention (D-LKA
Attention)}, a streamlined attention mechanism employing large convolution
kernels to fully appreciate volumetric context. This mechanism operates within
a receptive field akin to self-attention while sidestepping the computational
overhead. Additionally, our proposed attention mechanism benefits from
deformable convolutions to flexibly warp the sampling grid, enabling the model
to adapt appropriately to diverse data patterns. We designed both 2D and 3D
adaptations of the D-LKA Attention, with the latter excelling in cross-depth
data understanding. Together, these components shape our novel hierarchical
Vision Transformer architecture, the \textit{D-LKA Net}. Evaluations of our
model against leading methods on popular medical segmentation datasets
(Synapse, NIH Pancreas, and Skin lesion) demonstrate its superior performance.
Our code implementation is publicly available at the:
https://github.com/mindflow-institue/deformableLK